General LUT node #39

maggiesquadric · 2025-02-21T06:34:41Z

Description

Created general LUT node

Motivation and Context

Part of ORT project

maggiesquadric · 2025-03-06T15:02:04Z

onnxruntime/contrib_ops/cpu/quantization/qlinear_binary_op.cc

+  // Test to see if we have access to enable_gpnpu flag
+  const bool gpnpu_flag = session_options.enable_gpnpu;
+
+  const ProcessBroadcastSpanFuncs functors = gpnpu_flag ? ProcessBroadcastSpanFuncs{


ternary for if gpnpu, use MlasQLinearAddFixedPoint inside instead of the original MlasQLinearAdd which is in the else clause

maggiesquadric · 2025-03-06T15:07:32Z

onnxruntime/contrib_ops/cpu/quantization/qlinear_global_average_pool.cc

 }

+template <typename T8Bits>
+Status ComputeQLinearGlobalAvgPoolFixedPoint(


this is identical to ComputeQLinearGlobalAvgPool except it calls MlasQLinearGlobalAveragePoolNchwFixedPoint instead of MlasQLinearGlobalAveragePoolNchw and MlasQLinearGlobalAveragePoolNhwcFixedPoint instead of MlasQLinearGlobalAveragePoolNhwc. as discussed with Chris, this could definitely be refactored and deleted so MlasQLinearGlobalAveragePoolNchw and MlasQLinearGlobalAveragePoolNhwc have a way to determine gpnpu flag inside. however, I believe I did not do this because the flag from session options can only be accessed from the highest level, not down in MlasQLinearGlobalAveragePoolNhwc and MlasQLinearGlobalAveragePoolNchw

maggiesquadric · 2025-03-06T15:07:50Z

onnxruntime/contrib_ops/cpu/quantization/qlinear_global_average_pool.cc

    bool channels_last,
    concurrency::ThreadPool* tp);

+template Status ComputeQLinearGlobalAvgPoolFixedPoint<int8_t>(


template mirroring existing structure from before

maggiesquadric · 2025-03-06T15:08:22Z

onnxruntime/contrib_ops/cpu/quantization/qlinear_global_average_pool.cc

-    return ComputeQLinearGlobalAvgPool(X.Data<uint8_t>(), x_scale, *(tensor_x_zero_point->Data<uint8_t>()),
-                                       Y.MutableData<uint8_t>(), y_scale, *(tensor_y_zero_point->Data<uint8_t>()),
-                                       N, C, image_size, channels_last_, tp);
+


if gpnpu, go to fixed point version, else original code

maggiesquadric · 2025-03-06T15:09:01Z

onnxruntime/contrib_ops/cpu/quantization/quant_gemm.cc


    std::vector<float> output_scales = ComputeOutputScale(a_scale, b_scale, y_scale);
-    std::optional<MLAS_QGEMM_SCALE_BIAS_OUTPUT_PROCESSOR> scale_bias_proc_ptr;
+


define 2 additional processors for fixed point. as discussed with Chris, this could be refactored

maggiesquadric · 2025-03-06T15:09:24Z

onnxruntime/contrib_ops/cpu/quantization/quant_gemm.cc

      gemm_param.OutputProcessor = &*scale_bias_proc_ptr;
    }
  }
+  static void SetPostProcessorFixedPoint(const Tensor* y_zp,


2 new processors defined above get used here

maggiesquadric · 2025-03-06T15:30:36Z

onnxruntime/core/mlas/lib/qgemm.h

    bool ZeroMode
 );

+template<typename KernelType>


actually I think this is unnecessary now since this didn't actually get used anywhere and there is no float math happening in the original MlasGemmQuantKernel

maggiesquadric · 2025-03-06T15:31:33Z

onnxruntime/core/mlas/lib/qladd.cpp

    }
 }

+template<typename DataType, bool IsScalarB>


fixed point version of MlasQLinearAddKernelRawHelper, does the conversion stuff for float scales

maggiesquadric · 2025-03-06T15:37:47Z

onnxruntime/core/mlas/lib/qpostprocessor.cpp

+    }
+}
+
+void MLAS_QGEMM_SCALE_BIAS_OUTPUT_PROCESSOR_FIXEDPOINT::Process(


looking back, I'm not sure what happened here because it's labeled fixed point but I still see floats in the function...

cjm715 · 2025-03-06T22:23:35Z

onnxruntime/core/mlas/lib/qpostprocessor.cpp

+    MLAS_FLOAT32X4 ScaleVector = MlasBroadcastFloat32x4(Scale_);
+#if !defined(MLAS_SSE2_INTRINSICS)


there are still float operations happening. need to revisit

maggiesquadric added 30 commits January 17, 2025 21:35

Added enable_gpnpu option in session_options and included pybindings

b641d11

Can now access enable_gpnpu flag in qlinearconv.cc

a3ec03d

Added functions for calculating frac bits

7770c7e

Debugging

a238a71

Working version

21d41d4

Deleted debug prints in quantize.cpp

f738d8b

Added unit test for qlinearconv

03e3ea5

Cleaning up

50d5afd

Edited test_qlinearconv.py

6f853f5

Made new folder for gpnpumode tests

70e4d38

Qlinearadd fixed point version, working on unit test

2d40708

Added unit test for QLinearAdd

4ea5fad

New branch for qgemm

01bfdd9

Working on create onnx for qgemm

d05bbd7

Working on adding null inputs to onnx graph in test_qgemm.py

9a96fd9

Qgemm working with unit test

17d7892

Modified test_qgemm.py

1f76d7a

Added test_qlineargap.py

22d50a6

Adding fixed point functionality

d746185

Added functionality for QLinearGlobalAveragePool

f791a22

Added unit test for QLinearGAP

97e8da6

Edited profiling for qlineargap

ab9c98a

Edited profiling for qlineargap

4ae29ff

Edits

75e930e

Editing

b8df013

Edits

bcea5b8

Edited test_resnet50.py

664c89b

profiling

98d2377

profiling

7d2e9f7

Edits

542db11

maggiesquadric added 8 commits January 31, 2025 21:57

Validation

0f3faa3

m1?

0fea2d8

lut op

b5bcd25

lut op working

fefc674

lut mostly working

0a8a0c5

Changed lut table to attribute instead of input

a536c32

Edited for mac to pass Cl

b66e8b0

new way of registering lut

c28cb28

maggiesquadric commented Mar 6, 2025

View reviewed changes

Cleanign up comments

ba6aad3

cjm715 reviewed Mar 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

General LUT node #39

General LUT node #39

Uh oh!

maggiesquadric commented Feb 21, 2025

Uh oh!

maggiesquadric Mar 6, 2025

Uh oh!

maggiesquadric Mar 6, 2025

Uh oh!

maggiesquadric Mar 6, 2025

Uh oh!

maggiesquadric Mar 6, 2025

Uh oh!

maggiesquadric Mar 6, 2025

Uh oh!

maggiesquadric Mar 6, 2025

Uh oh!

maggiesquadric Mar 6, 2025

Uh oh!

maggiesquadric Mar 6, 2025

Uh oh!

maggiesquadric Mar 6, 2025

Uh oh!

cjm715 Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		std::vector<float> output_scales = ComputeOutputScale(a_scale, b_scale, y_scale);
		std::optional<MLAS_QGEMM_SCALE_BIAS_OUTPUT_PROCESSOR> scale_bias_proc_ptr;

		MLAS_FLOAT32X4 ScaleVector = MlasBroadcastFloat32x4(Scale_);
		#if !defined(MLAS_SSE2_INTRINSICS)

General LUT node #39

Are you sure you want to change the base?

General LUT node #39

Uh oh!

Conversation

maggiesquadric commented Feb 21, 2025

Description

Motivation and Context

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants